Skip to content

zebra: fix SRv6 encap lost during recursive nexthop resolution#21519

Open
dawkopagh wants to merge 2 commits intoFRRouting:masterfrom
dawkopagh:dkopec/zebra/fix-recursive-srv6-nexthop
Open

zebra: fix SRv6 encap lost during recursive nexthop resolution#21519
dawkopagh wants to merge 2 commits intoFRRouting:masterfrom
dawkopagh:dkopec/zebra/fix-recursive-srv6-nexthop

Conversation

@dawkopagh
Copy link
Copy Markdown

Description

When resolving a recursive nexthop, nexthop_set_resolved() copied MPLS labels from both the resolver's FIB nexthop (newhop) and the parent nexthop, but copied SRv6 info only from the parent. As a result, an IPv4 route whose nexthop resolved through an SRv6 VPN route was installed with encap mpls instead of encap seg6, silently breaking traffic forwarding.

Fix

Fixed by adding a newhop->nh_srv6 copy block before the existing nexthop->nh_srv6 block, mirroring the MPLS label stacking logic. Both seg6local action and seg6 SID stack are propagated, a sid_zero() guard prevents copying an uninitialised SID.

Testing

Before fix:

(Pdb) print(r1.run("ip route show vrf test1"))
198.0.0.150 nhid 12  encap mpls  16 via inet6 fe80::98de:f8ff:fe1b:5766 dev eth0 proto 196 metric 20

(Pdb) print(r1.run("ip -6 route show vrf test1"))
2600:1000::101 nhid 9  encap seg6 mode encap segs 1 [ 2001:db8:3:1:: ] via fe80::98de:f8ff:fe1b:5766 dev eth0 proto bgp metric 20 pref medium

(Pdb) print(r1.vtysh_cmd("show ip route vrf test1"))
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv4 unicast VRF test1:
S>  198.0.0.150/32 [1/0] via 2600:1000::101 (recursive), weight 1, 00:27:57
  *                        via fe80::98de:f8ff:fe1b:5766, eth0, label 16, weight 1, 00:27:57

(Pdb) print(r1.vtysh_cmd("show ipv6 route vrf test1"))
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv6 unicast VRF test1:
B>* 2600:1000::101/128 [20/0] via fe80::98de:f8ff:fe1b:5766, eth0 (vrf default), label 16, seg6 2001:db8:3:1::, weight 1, 00:39:51

After fix:

(Pdb) print(r1.run("ip route show vrf test1"))
198.0.0.150 nhid 12  encap seg6 mode encap segs 1 [ 2001:db8:3:1:: ] via inet6 fe80::d05d:bdff:fefd:2669 dev eth0 proto 196 metric 20

(Pdb) print(r1.run("ip -6 route show vrf test1"))
2600:1000::101 nhid 9  encap seg6 mode encap segs 1 [ 2001:db8:3:1:: ] via fe80::d05d:bdff:fefd:2669 dev eth0 proto bgp metric 20 pref medium

(Pdb) print(r1.vtysh_cmd("show ip route vrf test1"))
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv4 unicast VRF test1:
S>  198.0.0.150/32 [1/0] via 2600:1000::101 (recursive), weight 1, 00:01:12
  *                        via fe80::d05d:bdff:fefd:2669, eth0, label 16, seg6 2001:db8:3:1::, weight 1, 00:01:12

(Pdb) print(r1.vtysh_cmd("show ipv6 route vrf test1"))
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv6 unicast VRF test1:
B>* 2600:1000::101/128 [20/0] via fe80::d05d:bdff:fefd:2669, eth0 (vrf default), label 16, seg6 2001:db8:3:1::, weight 1, 00:01:17

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

This PR fixes a bug in nexthop_set_resolved() where SRv6 encapsulation info from the resolved (FIB) nexthop (newhop->nh_srv6) was silently dropped during recursive nexthop resolution, causing an IPv4 route to be installed with MPLS encap instead of SRv6 seg6 encap. The fix adds a newhop->nh_srv6 copy block mirroring the existing MPLS label stacking logic, and also adds a defensive XREALLOC path in nexthop_add_srv6_seg6 to handle the case where successive calls grow the SID segment stack beyond its initial allocation.

Confidence Score: 5/5

Safe to merge — fix is correct, well-scoped, and covered by a dedicated regression test.

The core change is a straightforward two-block addition mirroring the existing MPLS label-stacking pattern. Memory management via XREALLOC is handled correctly (old pointer freed by realloc, all fields explicitly written after resize). The sid_zero guard is consistent with its usage in the existing nexthop block. No regressions are introduced to existing paths since the new newhop block only runs when newhop->nh_srv6 is non-NULL, which was previously silently ignored. A complete regression test suite exercises the exact kernel FIB output.

No files require special attention.

Important Files Changed

Filename Overview
zebra/zebra_nhg.c Core fix: adds a newhop->nh_srv6 copy block before the existing nexthop->nh_srv6 block in nexthop_set_resolved(), with proper sid_zero() guards; parent nexthop takes precedence when both are non-NULL.
lib/nexthop.c Adds an XREALLOC branch to nexthop_add_srv6_seg6 so successive calls with a growing SID count correctly resize the segment stack instead of silently overflowing the original XCALLOC buffer.
tests/topotests/bgp_srv6_recursive_nhop_encap/test_bgp_srv6_recursive_nhop_encap.py New regression test suite: four tests covering VPN route presence, FRR RIB SRv6 encap, kernel FIB encap (with explicit must_not_contain=["encap mpls"]), and SID consistency between the VPN and static routes.
tests/topotests/bgp_srv6_recursive_nhop_encap/r1/frr.conf r1 config: reproduces the bug — static IPv4 route with no SRv6 SID whose nexthop resolves recursively through an SRv6 VPN route imported from r2.
tests/topotests/bgp_srv6_recursive_nhop_encap/r2/frr.conf r2 config: exports 2600:1000::101/128 from vrf test1 as an IPv6 VPN route with SRv6 SID 2001:db8:3:1:: drawn from locator 2001:db8:3::/48; RT 0:20 matches r1's import policy.
tests/topotests/bgp_srv6_recursive_nhop_encap/r1/ipv4_rib_test1.json Expected RIB JSON for 198.0.0.150/32: asserts seg6 SID 2001:db8:3:1:: and H.Encaps behavior are present in the installed nexthop.
tests/topotests/bgp_srv6_recursive_nhop_encap/r1/ipv6_rib_test1.json Expected RIB JSON for 2600:1000::101/128: prerequisite check asserting the VPN resolver route itself carries the expected SRv6 SID.

Sequence Diagram

sequenceDiagram
    participant Z as zebra
    participant NHG as nexthop_set_resolved()
    participant NH as nexthop_add_srv6_seg6()

    Note over Z: Recursive resolution of 198.0.0.150/32<br/>nexthop=static NH (nh_srv6=NULL)<br/>newhop=FIB NH of SRv6 VPN route (nh_srv6=SID)

    Z->>NHG: resolve nexthop (newhop, nexthop)
    NHG->>NHG: copy MPLS labels (newhop then nexthop)

    alt newhop->nh_srv6 != NULL (NEW block)
        NHG->>NH: nexthop_add_srv6_seg6(resolved_hop, newhop SID)
        NH->>NH: XCALLOC seg6_segs (first call)
        NH-->>NHG: resolved_hop->nh_srv6 set with newhop SID
    end

    alt nexthop->nh_srv6 != NULL (existing block)
        NHG->>NH: nexthop_add_srv6_seg6(resolved_hop, nexthop SID)
        NH->>NH: XREALLOC if nexthop num_segs > newhop num_segs
        NH-->>NHG: parent SID takes precedence
    else nexthop->nh_srv6 == NULL (bug fix case)
        NHG->>NHG: skip — newhop SID is preserved
    end

    NHG-->>Z: resolved_hop with correct SRv6 encap
    Z->>Z: install route with encap seg6 (not encap mpls)
Loading

Reviews (2): Last reviewed commit: "zebra: fix heap overflow and guard incon..." | Re-trigger Greptile

@cscarpitta cscarpitta self-requested a review April 14, 2026 08:45
Comment thread zebra/zebra_nhg.c Outdated
Comment thread zebra/zebra_nhg.c
if (newhop->nh_srv6->seg6_segs &&
newhop->nh_srv6->seg6_segs->num_segs &&
!sid_zero(newhop->nh_srv6->seg6_segs))
nexthop_add_srv6_seg6(resolved_hop,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the greptile issue with these apis looks valid to me

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 7212a1e

@github-actions github-actions Bot added the rebase PR needs rebase label Apr 30, 2026
When resolving a recursive nexthop, nexthop_set_resolved() copied MPLS
labels from both the resolver's FIB nexthop (newhop) and the parent
nexthop, but copied SRv6 info only from the parent.  As a result, an
IPv4 route whose nexthop resolved through an SRv6 VPN route was
installed with encap mpls instead of encap
seg6, silently breaking traffic forwarding.

Fix by adding a newhop->nh_srv6 copy block before the existing
nexthop->nh_srv6 block, mirroring the MPLS label stacking logic.
Both seg6local action and seg6 SID stack are propagated, a sid_zero()
guard prevents copying an uninitialised SID.

Signed-off-by: Dawid Kopec <dkopec@akamai.com>
@dawkopagh dawkopagh force-pushed the dkopec/zebra/fix-recursive-srv6-nexthop branch from 38247a1 to 7212a1e Compare April 30, 2026 09:57
@dawkopagh
Copy link
Copy Markdown
Author

@greptileai

@dawkopagh dawkopagh requested a review from mjstapp May 4, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants